Language embeddings that preserve staging and safety 



Todd L. Veldhuizen, Chalmers University of Technology 



Abstract. We study embeddings of programming languages into one another that 
preserve what reductions take place at compile-time, i.e., staging. A certain condi- 
tion — what we call a 'Turing complete kernel' — is sufficient for a language to be 
stage- universal in the sense that any language may be embedded in it while preserv- 
ing staging. A similar line of reasoning yields the notion of safety-preserving em- 
beddings, and a useful characterization of safety -universality. Languages universal 
with respect to staging and safety are good candidates for realizing domain-specific 
embedded languages (DSELs) and 'active libraries' that provide domain-specific 
optimizations and safety checks. 

1. Introduction 

Embeddings of programming languages into one another are useful in study- 
ing their relative power and, sometimes, finding languages that are univer- 
sal in some sense. Examples include Turing-reducibility for studying com- 

putability, poly-time reductions for subrecursive language s |Royer and Casel994|, 

and 'structure-preserving' embeddings for expressiveness |de Boer^imTPaTamidessil994l 
IFelleisenl99TT IMitchel ll993 . Matsushital998j . 

To further a search for languages suited to realizing domain-specific em- 
bedded languages (DSELS) Sandcwalll9 78llEmanuelson and Haraldssonl 980 , 
Hudakl996j and "active libraries," |Czarnecki et aI2000j we propose stage- 
preserving embeddings as a tool to study languages in which some evaluation 
or simplification is guaranteed to take place at compile-time. Such guar- 
antees can be wielded to realize domain-specific optimizations and safety 
checks. The principal result shown here is that if a language has what we 
call a 'Turing-complete kernel,' it is universal in the sense that any language 
may be embedded into it while preserving staging and safety properties. 



1.1 Some background on computability 

Throughout this paper we shall rely on some basic notions from computabil- 
ity theory. We say a set of natural numbers S C N is decidable or equiva- 
lently A® when there exists a Turing machine that given as input any x G N 
can decide whether x G S. A set S C N is computably enumerable or T,® 
when there exists a Turing machine that given input x £ N will halt ex- 
actly when x E S. (We follow the recommendation of Soare |Soarel996] 
that the traditional term recursively enumerable be retired in favour of the 
more descriptive term computably enumerable.) These notions extend easily 
to sets of strings and terms by employing an appropriate coding of objects 
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by natural numbers. For example, strings over a finite alphabet A can be 
encoded by treating a string x € A* as a base-|A| natural number; we may 
then speak of a set of strings over A as computably enumerable or decidable. 
A function implemented by a computer is appropriately modelled by a par- 
tial function, since the computation may fail to terminate for some values 
of the domain. A partial function / : N — N is computably enumerable or 
when it is computable by a Turing machine; in this case we say / is a 
partial computable function. 

2. Stage-preserving embeddings 

The formalization of programming languages and compilers is susceptible 
to fussiness, and to keep this at bay I propose to be precise where it mat- 
ters and vague where it does not. Let us adopt a grossly simplified view, 
typical of computability, in which a programming language is merely a set 
of programs represented by binary strings. One way to achieve this per- 
spective is to view the program text (a sequence of characters) as a single, 
large binary string. We shall suppose the programming languages of in- 
terest are all being compiled to one implementation language Lm — the 
subscript M suggesting a machine language. To speak of translations being 
semantics-preserving, we require that Lm comes paired with an equivalence 
~ on machine language programs capturing some desired notion of program 
equivalence — the precise meaning of ~ does not matter for our purposes. 
For two programs p,p' G Lm, we write p ~ p' to mean they do the same 
thing. 

We define programming languages in terms of their compilation to Lm- 

Definition 1. A programming language is a pair (La, 0a) with La a de- 
cidable set of binary strings representing valid programs, and <j>A ■ La — > Lm 
a compilation map required to be computably enumerable. 

Some languages have compilers that do not necessarily terminate — C++ 
and MetaML are examples |Bohme and Manthey2003llTaha and Sheard 2000 
For this reason compilers are appropriately modelled by computably enumer- 
able partial functions, rather than total functions. To keep the notational 
convenience of total functions we employ the usual device of introducing 
a special element J_ G Lm to indicate a nonterminating compilation, and 
require that JL is in a singleton equivalence class under ~, i.e., p ~ J_ if and 
only if p = X. 

Definition 2. A language embedding e : La — > Lb is an injective and 
computable function that is semantics-preserving, i.e., 4>a{p) ~ 4>B(ep) for 
all p € La- 
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The typical scenario we shall consider is illustrated by this diagram: 

La »L tt (1) 




We have two source languages La and L u , compilers eft a and <p u for them, 
and we consider an embedding e : La — ► L u . We ask when embeddings 
that preserve properties of interest (semantics, staging, safety) exist. The 
scenario of special interest is when L u is some language purporting to be 
'universal.' 

We use the notion of stages to address compile-time computations (cf. 
|Jones et aJ.19931ITaha and Shear d2000 1). We are interested in embeddings 
that are stage-preserving: if a computation occurs at compile time in lan- 
guage La , then it occurs at compile time in language L u . This can be 
conveniently addressed using the kernel of the compiler. Recall that the 
kernel of a map <f) is: 

ker(^) = {{pi,p 2 ) \ 0(pi) = Hp?)} ( 2 ) 



The kernel of a compiler is a simple but versatile notion. The kernel is an 
equivalence relation; every program in a kernel-equivalence class compiles to 
the same target program. Kernels capture staging — from the kernel one can 
deduce what compile-time reductions take place. For instance, a language 
whose compile-time evaluations are defined by a rewrite relation — > must 
satisfy — > C ker (</>), where <f> is its compiler (Figure ^ shows an example of 
some MetaML-like terms). A useful analogy may be drawn to linear algebra, 
where the kernel of a linear transformation yields its nullspace. When a 
vector is transformed, every component lying in the nullspace is zeroed. 
Analogously, any code lying in the kernel of the compiler 'disappears' at 
compile-time. Thus we can view the kernel as a staging specification and 
use it to formalize the notion of a stage-preserving embedding. 1 

Definition 3. An embedding e : La — ► L u is stage-preserving when it sat- 
isfies (pi,p 2 ) G ker(0A) => (epi,ep 2 ) G ker(0 u ). 

Figure |2] illustrates. The kernel of a compiler gives us a measure of its 
staging power, that is, its ability to reduce computations at compile time. 
Defn. |3] effectively says: to increase the staging power of a language, make 

1 The kernel is related to, but different from, binding-time specifications (cf. Jo nesl996l 
IJones ct all993 ): the kernel indicates which programs will compile to the same target 
program, whereas binding-times indicate which terms are replaceable by constants. These 
two ideas coincide in some situations, e.g., when programs are terms, the compilation map 
is compositional, and only partial evaluation is taking place. 
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La programs (source) 



Lm programs (target) 



x + 2 




x + ~(l + l) 



*- IADD x 2 



x + ~(l + (2-l)) 



y + 2 



^ IADD y 2 




y + ~(4-2) 



Fig. 1: Illustration of the kernel of a compiler <j> for some terms in a hypothetical staged 
language with the escape operator ~(). Expressions enclosed by ~() are evaluated at 
compile time. The kernel gives equivalence classes of source programs that map to the 
same compiled program, in this case ker(<^)) yields the equivalence classes {{x + 2, x + 
~(l + l),x + ~(l + (2-l))},{y + 2,i, + ~(4-2)}}. 

its kernel larger. But at what point is a kernel "big enough" that we can 
embed any language into it and preserve staging? To answer this, let us 
order languages, writing La <s to mean there exists a stage-preserving 
embedding e : La — > Lb- The relation <j is a preorder, i.e., reflex- 
ive and transitive, but not necessarily anti-symmetric. Given languages 
La, Lb, Lq, Lb, ■ ■ ■ we might have the following diagram of <g, with ar- 
rows indicating the existence of stage-preserving embeddings: 



The obvious question is whether there might exist languages maximal in the 
order we call such languages stage-universal. 



L D 




La 
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Fig. 2: Illustration of stage-preserving embedding. If two programs in La compile to the 
same program in Lm, then after embedding in L u they must still compile to the same 
program. Note, though, that it is not required that </>a(p) = 4> u (ep), i.e., we do not expect 
to get the same target program going either route, though this would be agreeable should 
it happen. 



Definition 4. A programming language is stage-universal when there is a 
stage-preserving embedding of any other programming language into it. 

The term stage-complete would do equally well. Now let us show that such 
languages exist and have a useful characterization. We shall construct such 
a language and refer to it as L u , the subscript here indicating universal. 
The universal language L u is required to provide some standard features of 
programming languages: 

(1) We assume there is an effective coding r - n of the languages La, Lm in 
L u ; this means we can represent a program in La by some term or com- 
putation in the language L u , and thereby examine and manipulate it. 
If p e La is a program then r p n may be thought of as a representation 
of p by its parse tree, as a string of characters, or (more traditionally) 
a very large natural number; the particulars do not matter so long as 
the encoding is unique and computable. 

(2) We shall want to manipulate representations of programs in L u , so 
we assume L u permits the construction of functions over codes (e.g., 
functions that manipulate parse trees), and write F(c) to mean the 
application of such a function F to a code c. It is useful to distinguish 
between functions implemented in L u , e.g., purely functional manip- 
ulations of coded programs, and programs such as interpreters that 
take such codes and produce behaviour. For a program P taking as 
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argument some code x, we write P[x]. 

(3) We assume L u has function composition: 

o If there are L u -functions F and G, then there is an L u -function 
FoG. 

o If there is a program P[-] and an L M -function F(-), then the con- 
struction P[F(-)] is meaningful: there is some program Pf[-] such 
that Pf[u] ~ P[x] when x = F(y). 

Much of what follows relies on the ability to interpret Lm programs in L u . 

Definition 5. An interpreter for the machine language Lm in the language 
L u is a program Im['] such that for every machine-language program p m E 
Lm, the interpreted version of p m is equivalent to p m : 

<Pu(hl[ r Pm~ 1 }) ~ Pm (3) 

That is, if we take some machine-language program p m and 'code' it as (for 
example) a syntax tree r p m n and give it to the interpreter Im, then 1m run- 
ning r p m ~ 1 behaves the same way as the program p m . The existence of such 
an interpreter ensures that the language L u does not lose basic capabilities 
of the language Lm, such as the ability to interact with the operating system 
and so forth. This is of concern when dealing with interactive systems (a.k.a. 
processes, reactive systems, etc.) rather than purely functional programs. 
The existence of such an interpreter guarantees that <j> u is onto the equiva- 
lence classes L u j ~ giving the possible behaviours of Lm programs. That is, 
for every machine-language program p m G Lm, there is a program p u € L u 
such that p u is indistinguishable in behaviour from p m , i.e., cp u (p 

u) ~ Pm- 

What we need next is some vocabulary to discuss compile-time computa- 
tions in the language L u . We work from the assumption stated earlier that 
L u has a mechanism for defining functions. 

Definition 6. A partial function f is 'realizable in the kernel' of(f) u if there 
exists an L u function F such that for any program P taking as argument a 
code, and for any x, y such that y = fix): 

MP[F( r x^)}) = MP[ r y n }) (4) 

Or, equivalently, (P[F( r x n )], P[ r y n ]) G ker(0 u ). 

This means, more or less, that the partial function F is evaluated at compile 
time. 

We now give a sufficient condition for stage-universality, inspired by ideas 
from partial evaluation, in particular Jones-optimality [Jones et aJ.1993] and 
the Futamura projections |Futamural971j . The proof is boilerplate com- 
putability theory and partial evaluation. We rely heavily on the assumption 
(stated earlier) that compilers are functions. 

Theorem 1. // 
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(i) there is an interpreter Im[~] f or ^ n L u ; and 

(ii) any Y,® function f is realizable in the kernel of (f> u , 
then the language L u is stage-universal. 

Proof. Pick a language and compiler La and 4>a- Since 4>a is £?, 
by (ii) there is a L n -function &a realizing it such that if p m = (j> a {p a ) then 
(/> u (P[$>A( r Pa~ 1 )]) = ( f ) u{P[ r Pm~ ] ]) for any program P taking a code- argument. 
Consider the embedding e : La — > L u given by: 



where Im[] is the L m interpreter whose existence is ensured by (i). Re- 
call from Defn. [31 that e is stage preserving when (p\,P2) £ ker(</> a ) => 
(epi,ep2) G ker(^ M ). Choose pi,P2 such that (pi,P2) £ ker(</> a ). Then there 
is a p m such that <t> a {pi) = <Pa{P2) = Pm, and from the choice of <I>yi, 



Therefore 4> u (&Pi) = 4>u{zp2), or (epi,ep2) € ker((/> n ), and the embedding e 
is stage-preserving. Since such an embedding exists for any language La, 
the language L u is stage-universal. □ 

We shall be sloppy henceforth and refer to a "Turing-complete kernel" to 
mean the properties listed in Theorem ^ 

The construction in the proof above is not of immediate practical use; 
there is no guarantee that an interpreted program 4> u {Im[^ aCp" 1 )]) will run 
anywhere near as fast as 4>a{p) (cf. Jones-optimality [Jones et ai. 19931 ). It 
does, however, give sufficient conditions for languages to be stage-universal: 

A language with a Turing-complete kernel can, in principle, sub- 
sume any staged language. 

This suggests we look to such languages to realize DSELs and 'active li- 
braries.' The construction above would be useful if 4> u found programs that 
were 'optimal.' That is, if the compiler (f> u were to find fastest, smallest, etc. 
programs, then the construction cj) u (I P^)}) would be practical. Find- 
ing optimal programs is undecidable, so this goal is not reachable. However, 
if we find programs that are near to optimal, then approaches nearing the 
construction of Theorem ^ might be practical. In Veldhuizen2004 one pos- 
sible method for realizing such compilers is described, using "Guaranteed 
Optimization," a new compiler design technique. 



e(p a ) = I M [^ A ( r Pa n )} 



(5) 



MlMl$A( r Pi n )}) 

<^(/M[M r P2 n )]) 



<j>u{lM[ r Pm n ]) and 
^u{hi[ r p m ]) 



(6) 



3. Safety-preserving embeddings 



Let us now turn to the question of when there exist language embeddings 
that preserve judgments about safety properties. 
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Since useful safety properties are often undecidable, compilers approxi- 
mate the set of safe programs in a conservative way. For example, many 
compilers incorporate a static typing phase that determines whether pro- 
grams are well-typed in some formalism; programs that fail typing are re- 
jected since they might be unsafe. When embedding one language into an- 
other, it is important that the set of programs judged to be safe is preserved. 
In particular we must avoid the possibility that a language embedding might 
allow us to run programs that fail safety checks in the source language. 

In the real world, compilers react to programs they judge unsafe by pro- 
ducing no output program and issuing a variety of diagnostic messages. 
For ease of modelling, let us suppose compilers have one designated out- 
put unsafe € Lm signifying a program that fails safety checks. The in- 
tent is that a compiler <j) judges a program p to be unsafe exactly when 
4>(p) = unsafe. There is then an obvious sense in which an embedding can 
be safety-preserving. 

Definition 7. A safety-preserving embedding e : La — > L u is a semantics- 
preserving embedding that preserves the set of programs judged unsafe, i.e., 
^«( e P) = unsafe if and only if 4>a{p) = unsafe. 

We require that no programs in Lm are equivalent to unsafe except unsafe 
itself, i.e., unsafe has a singleton equivalence class under ~. This means, 
incidentally, that semantics-preserving (Defn. |2J) implies safety-preserving. 

Following a similar line of reasoning as before, we ask whether there are 
languages that are safety-universal, in the sense that any language may be 
embedded into it while preserving safety. There are two approaches we 
explore here. The first is to note an obvious, but somewhat unenlightening, 
corollary of Theorem^ 

Corollary 1 . Any language meeting the criteria of Theorem is safety- 
universal. 

This follows because stage-preserving embeddings are semantics-preserv- 
ing, and from the way we defined the special compiler output unsafe, any 
stage-preserving transformation is safety-preserving (Defn. [7J). Therefore 
any stage-universal language is also safety-universal. 

For a more informative construction, let us consider compilers that employ 
a preliminary safety checking phase. We presume this safety checking phase 
implements a proof calculus h making judgments of the form h safe(p), 
indicating the program p is safe. This is a general framework that subsumes, 
for example, type systems; we can augment a typical type inference system 
with an additional rule of the form: 

h p : r 
h safe(p) 

This states that if a program p can be given a type r, then it is safe. We 
limit ourselves to effective proof calculi, i.e., those whose deductions are 
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computably enumerable, and in particular to relations safe(p) that are de- 
cidable. We will write \f safe(p) to mean "safe(p) is not a valid deduction of 

k" 

Theorem 2. Let La,$a be a language and its compiler, and h be an proof 
calculus with judgments of the form h safe(p) for some p G La, such that the 
set {p | h safe(p)} is decidable. Let L u , <p u be a language and compiler meet- 
ing the criteria of Theorem^ Then there is a stage-preserving embedding 
e : La — > L u such that cp u (ep) = unsafe if and only if 1/ safe(p). 

Proof. Consider the function (p' A : La — > £m given by: 



Since the set {p | h safe(p)} is decidable, i.e. A°, and ^ is S°, the function 
0^ is Sj 1 . By the conditions of Theorem^ there exists a u- function &' A 
realizing <j)' A in the kernel of <f) u . Consider the embedding 



Following the reasoning given in the proof of Theorem ^ e(p a ) = unsafe if 
and only if 1/ safe(p), and e is a stage-preserving embedding. □ 

A key requirement, implicit in the above proof, is that the function §' A 
must be able to produce r unsafe n , i.e., the code of an unsafe program. The 
intuition we can draw from this is the following: 

Any language with a Turing-complete kernel and the ability to 
construct at compile-time a condition signifying "unsafe program" 
is safety-universal. 



Variations on extensible and universal programming languages have been 
explored for decades. We have examined a new twist on this theme, looking 
not just to languages that are Turing- complete (can perform any effective 
procedure) or syntactically extensible (can provide a domain-specific syn- 
tax), but to languages that are universal with respect to staging and safety. 
Such languages appear ideal for expressing domain-specific safety checks and 
optimizations, suggesting a route to realizing libraries and DSELs that are 
not only expressive, but also fast and safe. 




e( Pa ) = hll&AiV)] 



4. Conclusions 
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